R: A Powerful Language for Data Analysis

Mohd Azmi

Introducation: R

  • R is a programming language specifically designed for statistical computing and data analysis.
  • It was created by statisticians and data analysts for tasks ranging from basic data manipulation to advanced statistical modeling.
  • Widely used in academia, industry, and research for its flexibility and extensive package ecosystem.

R Key Features

  • Open-source and freely available.
  • Rich statistical and graphical capabilities.
  • Active and supportive user community.
  • Cross-platform compatibility (Windows, macOS, Linux).

R Use Cases

  • Data cleaning and manipulation.
  • Statistical analysis and hypothesis testing.
  • Data visualization with customizable plots.

Installing R

  • If you already have R and RStudio, reinstall the latest version is recommended
  • Download R from CRAN website
  • Install R
  • Accept default

Basic (1)

  • R is a programming language specifically designed for statistical computing and data analysis.
```{r}
1 + 2
```
[1] 3
  • We can assign values to variable
```{r}
one <- 1
two <- 2

one + two
```
[1] 3

Data Type (2)

  • R recognize various data type
```{r}
class(c(1, 2, 3))
```
[1] "numeric"
```{r}
class(c("azmi", "kimsui", "liana"))
```
[1] "character"
```{r}
class(c(TRUE, FALSE, TRUE, TRUE))
```
[1] "logical"

other data type: integer, date, time-date, factor

Operator (3)

  • R have different types of operator
    • arithmetic operator: +, -, *, /
    • logical operator: >, <, >=, <=, !=, &, !
    • assignment operator: =, <-
    • special: %>%, %in

Vector, Array, List??? (4)

  • vector: one dimensional array
    • must be same data type
# Numeric vector
numeric_vector <- c(1, 2, 3, 4, 5)

# Character vector
character_vector <- c("apple", "orange", "banana")

# Logical vector
logical_vector <- c(TRUE, FALSE, TRUE)

Vector, Array, List??? (4)

  • Array: multidimensional extension of a vector.
    • two or more dimension, but need same length
# Create a 2x3 array with numeric values
numeric_array <- array(c(1, 2, 3, 4, 5, 6), dim = c(2, 3))

# Display the array
numeric_array
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Vector, Array, List??? (4)

  • List: versatile data structure
    • can hold different data types and different length
    • can contain vectors, array, list or any combination
# Create a list with different types of elements
my_list <- list(
  numeric_vector = c(1, 2, 3),
  character_vector = c("apple", "orange"),
  logical_vector = c(TRUE, FALSE),
  numeric_matrix = matrix(1:4, nrow = 2),
  nested_list = list(a = 10, b = "hello")
)

Vector, Array, List??? (4)

  • List: versatile data structure
    • can hold different data types and different length
    • can contain vectors, array, list or any combination
# Display the list
my_list
$numeric_vector
[1] 1 2 3

$character_vector
[1] "apple"  "orange"

$logical_vector
[1]  TRUE FALSE

$numeric_matrix
     [,1] [,2]
[1,]    1    3
[2,]    2    4

$nested_list
$nested_list$a
[1] 10

$nested_list$b
[1] "hello"

Dataframe (5)

  • most common data structure
  • two dimensional tabular data structure
name <- c("azmi", "kimsui", "liana")
gender <- factor(c("M", "M", "F"), levels = c("M", "F"))
score <- c(75, 95, 85)

ncd <- data.frame(name, gender, score)

ncd
    name gender score
1   azmi      M    75
2 kimsui      M    95
3  liana      F    85

What’s More?

further concept in R will be explain using RStudio & Quarto

  • import dataset
  • package
  • data wrangling
  • data vizualization

and many more

Package

  • R have worldwide user and contributors
  • R is modular, and use package ecosystems
  • Package: collection of R functions, data and compiled code designed to perform specific set of task
  • Package play crucial role in extending R functionality
  • need to be install

Common R package

  1. dplyr: data manipulation and tranformation
  2. ggplot2: data vizualization
  3. tidyr : data reshaping
  4. caret: machine learning
  5. haven: import data from SAS, SPSS, STATA